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Abstract 



This erratum points out an error in the simplified drift theorem (SDT) 
|14^ I15j . It is also shown that a minor modification of one of its conditions 
is sufficient to establish a valid result. In many respects, the new theorem is 
more general than before. We no longer assume a Markov process nor a finite 
search space. Furthermore, the proof of the theorem is more compact than 
the previous ones. Finally, previous applications of the SDT are revisited. 
It turns out that all of these either meet the modified condition directly or 
by means of few additional arguments. 

1 Introduction 

The so-called simplified drift theorem, first presented in [H], deals with stochastic 
processes that drift away from a target, i. e., processes that in expectation increase 
the distance from the target in a step. For example, if > is the state of 
the process at time t and the aim is to reach the target state 0, then the SDT 
deals with the case E{Xt+i — Xt) > e. (Sometimes this is called negative drift, if 
perspectives are switched and the aim is to reach a maximal state.) 

The aim is to show that the process takes a long time to reach the target. 
Intuitively, a drift away from the target is not enough for this. It might well be 
that the expected change is positive, but a direct jump to the optimum occurs 
with a fair probability. Therefore, the simplified drift theorem contains a second 
condition in addition to the drift condition, namely an exponential decay of the 
probability of jumping towards the target, more precisely a condition of the kind 
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Prob(Xf+i — Xt < —j) < for all natural j > 0. See [H] and the extended 
journal version [TJ] for a precise formulation. 

The SDT has found many applications and had a large impact on the way 
lower bounds on the optimization time of randomized search heuristics are proved. 
Unfortunately, we recently discovered an error in its proof. In fact all existing 
variants of the SDT presented before, in any case the ones in [TH [TS], seem to be 
wrong unless the second condition is strengthened to something being essentially 
like Prob(|Xt+i — Xt\ > j) < , i.e., we should also have an exponential decay 
for the jumps towards the target (note the absolute value). 

This erratum is structured as follows. We will present a counterexample to the 
previous SDT in Section |2} In Section |3j we will present and prove a corrected 
version of the SDT. In Section |4| we show that seemingly all applications of the 
original SDT also satisfy the stronger conditions, either immediately or after a few 
additional arguments. 

2 An Example Where the Original Theorem is 
Wrong 

Consider the SDT as presented in [15] (which essentially is the new Theorem [2] in 
Section [3] but with the weaker second condition Prob(Xt+i — Xt < —j \ a < Xt) < 

Let us look into the following Markov process on some state space being expo- 
nentially large in n, say S = {— 3e", 3e"} (the precise size does not matter, but 
the original drift theorem demands a finite search space). We set a := as target 
and b := n as starting point, i. e., Xq = b = n. 

The stochastic behavior of the process is as follows: Conditioned on Xt G ]a, b], 
we have Xt+i = Xt + 2e" with probability e~", and Xt+i = Xt — 1 with the 
remaining probability 1 — e~". Note that the steps towards the target only have 
size 1. The behavior in the case {Xt < aVXt > b} is not important, say Xt+i = Xt 
then (process stops). 

We get E{Xt+i - Xt \ Xt e ]a,b[) = e"" • 2e" + (1 - e"") ■ (-1) > 1, hence 
there is constant drift away from the target towards b within the drift interval. 

However, it is very likely (probability at least 1 — ne~^) that starting from 6, 
the process takes n decreasing steps in a row and reaches a. 

The "proof" of the SDT presented in O [15] erroneously estimates a double 
sum appearing in the moment-generating function of the drift from above. More 
precisely, all single terms are uniformly bounded without paying attention to their 
sign. 
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3 The Corrected Version 



Our aim is to present a corrected simplified drift tlieorem, wliicli as before deals 
with drift away from the target and holds in both discrete and continuous search 
spaces. To this end, the following lemma will be useful. 

Lemma 1 Let X be a random variable with minimum x^i^- Moreover, let f : R — )■ 
H be a non- decreasing function and suppose that the expectation E{f{X)) exists. 
Then 

oo 

E{f{X)) < J2 /(a^min + ^ + 1) ■ Prob(X > x^in + ^). 

i=0 

Proof: We denote by p the probability measure from the probability space {Q, E, p) 
underlying X. Then the expectation is given by a Lebesgue integral, more precisely 

E{f{X))= [ /(XM)p(da;) 
Jn 

Since / is non-decreasing and X > x^in, partial integration yields 

oo „ 
E{f{X)) < J2 /(a^min + ^ + 1) / Pidoo) 

OO 

- X] fi^rain + i + I) Prob(X > X^in + i)- 
1=0 

□ 

We will use Hajek's following drift theorem to prove our result. In contrast 
to [T3], our presentation of Hajek's drift theorem does not make unnecessary as- 
sumptions such as non-negativity of the random variables or Markovian processes. 
As we are dealing with a stochastic process, we implicitly assume that the ran- 
dom variables Xt, t > 0, are adapted to the natural filtration J^t = i^o, ■ ■ ■ ,^t), 
though. 

We do no longer formulate the theorem using a "potential function" g mapping 
from some state space to the reals either. Instead, we w. 1. o. g. assume the random 
variables Xt as already obtained by the mapping. 

Theorem 1 (Hajek jT]) Let Xt, t>0, be real-valued random variables describ- 
ing a stochastic process over some state space. Pick two real numbers a{i) and b{i) 
depending on a parameter i such that a{i) < h{€) holds. Let T{i) be the random 
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variable denoting the earliest point in time t > such that Xf < a{£) holds. If 
there are X{i) > and p{i) > such that the condition 

^^g-AW.(x,+i-xo I -p^ ^ <Xt< b{£)) < 1 - ^ (*) 

holds for all t > then for all time bounds L{£) > 

Prob(T(£) < L{e) I Xo > b{e)) < e-^W-(^W-'^W) • L{i) ■ D{i)-p{i), 

where D{i) = max{l, E(e-^(^>(^*+i-''W' I J^t A > b{£))} . 

We now present the corrected simplified drift theorem. As discussed above, it 
combines a drift away from the target with a condition on exponentially decaying 
probabilities for large jumps both towards and away from the target. Nevertheless, 
the theorem has become more general in other respects. More precisely, we do no 
longer assume a Markov process or a finite search space. At the same time, the 
proof is more compact than before. 

Theorem 2 (Simplified Drift Theorem) Let Xt, t > 0, be real-valued random 
variables describing a stochastic process over some state space. Suppose there exist 
an interval [a,b] C R, two constants 6,e > and, possibly depending on i := b — a, 
a functionr{i) satisfying 1 < r{i) = o{i/\og{i)) such that for allt > the following 
two conditions hold: 

1. E(Xt+i -Xt\J't ^ a<Xt<b) > e, 

2. Prob(|X,+i -Xt\>J\J't^a< X^) < for j G INq. 

Then there is a constant c* > such that for T* := min{t > : < a | J-'^ A Xq > 6} 
It holds Prob(T* < 2^*^/^^) = 2-^(^/^W). 

Proof: We will apply Theorem[T]for suitable choices of its variables, some of which 
might depend on the parameter i = b — a denoting the length of the interval [a, b]. 
The following argumentation is also inspired by Hajek's work 

Fix t > 0. For notational convenience, we let A := (X^+i — Xt \ J-'t A a < 
Xt < b) and omit to state the filtration J-^ hereinafter. To prove Condition ([*]), it 
is sufficient to identify values A := A(£) > and > such that 

Efe-^^) < 1 ^ 



Using the series expansion of the exponential function, we get 
E{e'^^) = 1-AE(A) + A2^^E(A'=) < 1-AE(A) + A2^^E(|A|'=). 

k=2 ' k=2 
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Since all terms of the last sum are positive, we obtain for all 7 > A 



^ k=2 

< 1 _ XE(A) + ^ Y: F^^d^l') < 1 - A. + A'^ 



=:C(7) 

where the last inequality uses the first condition in the theorem. 
Given any 7 > 0, choosing A := min{7, e/(2C(7))} results in 

E(e-)<l-Ae + A.^.CW = 1-^ = 1- 

with p{i) := 2/(Ae) = 6(1/A) since only A but not e is allowed to depend on i. 

The aim is now to choose 7 in such a way that i?(e'^''^') = 0{r(i)). Using 
Lemma [1] with f{x) := e^^ and Xmm := 0, we get 



E(e^l^l) < f;e''(^-+i)Prob(|A|>j) < Y.^'^'^'^ TV^^ 
j=o j=o ^ ' 

where the last inequality holds by the second condition of the theorem. 

Choosing 7 := ln(l + S/2), which does not depend on i since 5 is a constant, 
yields 

= r{i) ■ f 1 + ^ V2 + ^1 < r{i) ■ (4 + 5 + 2/6). 



2; V 

Hence ^(7) < r(^)(4 + 6 + 2/6)/\n\l + 6/2), which means ^(7) = 0(r(£)) since 
5 is a constant. By our choice of A, we have A > e/{2C{-y)) = r2(l/r(£)) since also 
e is a constant. Since p{\) = 6(1/A), we know p{i) = 0{r{i)), too. Condition Q 
of Theorem [1] has been established along with these bounds on p{i) and A = A(£). 

To bound the probability of a success within L{i) steps, we still need a bound 
on D(i) = ma,x{l, E{e~^^^*+^~^^ \ Xt > b)}. If 1 maximizes the expression then 
D{i) < r{£) follows. Otherwise, 

D{i) = E(e-^(^'+^-^) \Xt>b) < ^(e-^(^*+i-^*) \ Xt > b) 
< ^(e^l^*+i-^*l \Xt>b) < ^(e^l^'+i-^*! \Xt>b), 
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where the first inequahty follows from Xt > b and the second one by 7 > A. The 
last term can be bounded as in the above calculation leading to E(e^^^^) = 0{r{£)) 
since that estimation uses only the second condition, which holds conditional on 
Xt > a. Hence, in any case also D{i) = 0{r{i)). Altogether, we have 

e-^^^>^ ■ D{1) ■ p{t) = e"^^^/''^^))'^ ■0((r(^))^) = e-^(^/'^(^))+<^(^°s(K^))) = 2"^(^/'"(^)\ 

where the last simplification follows since r{i) = o(£/log(^)) by prerequisite. Choos- 
ing L{i) = 2'^*^/''(^) for some sufficiently small constant c* > 0, Theorem [l] yields 

Prob(r(£) < L(£)) < ■ 2-^(^/^'(^» = 2-^(^/'-(^)), 

which proves the theorem. □ 



4 Previous Applications 

Obviously the new condition does not impact RLS with constant size neighbor- 
hoods. Hence the stronger condition immediately holds for [17] in the context of 
simulated annealing. We would expect that the condition is already met in all the 
proofs with standard bit mutation. In particular, when it is necessary to flip at 
least j bits to decrease the distance from the target of an amount j, generally the 
j bit- flips are required also to increase the distance by j, implying that the proofs 
immediately carry over for the new stronger condition. In this case, the condition 
holds for 5 = 1 and r{i) = 2: 

p(A,w<+,)<r"ViV<^<2 ■ 



jj \nj j! \2 

This is the case for theorems 5, 6 and 7 of [HI dS] respectively considering the 
(l-t-l)-EA for the Needle-in-a-haystack function and the (l-l-l)-EA with fitness 
proportional selection for linear functions and for OneMax. With at most small 
variations, such a case also applies to [3] relatively to the (l-l-l)-EA for monotone 
functions, to [8l|5] concerning multi-objective problems, to [TOl[TT] related to com- 
puting unique input output sequences in software engineering, to [TB] for dynamic 
optimisation and to [1] concerning Estimation of Distribution Algorithms (EDAs). 

Although the calculations do not imply only flipping j bits to decrease the dis- 
tance by j, the bounds obtained in [13] (i.e. vertex cover instances) for decreasing 
the distance by j also apply to increasing the distance by j. The same holds for 
[9j where the calculations for the stronger condition remain the same as long as 
the Chvatal bound in the opposite direction is applied (i.e. Pr[X < E{X) —r6] < 
(exp(-2(5V)) [2J. 
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For some previous applications, a bound on the probability of performing jumps 
increasing the distance from the optimum needs to be derived. This is the case 
for the maximum matching application (i. e., Theorem 8) of [m [15], for Theorem 
3 of |1] analysing a fitness-diversity mechanism, for Theorem 8 of [12] analysing 
a fitness-proportional EA, and for Theorem 7 of [TS] analysing the (1,A)-EA for 
OneMax. In the latter the Simplified Drift theorem considering self-loops, devised 
there, also needs to be adjusted to consider the stronger drift condition. In the 
following we will discuss how to derive the missing bounds and show how the 
stronger condition holds with simple calculations. 

4.1 Maximum Matching (Theorem 8, [HI, [T5| ) 

The probability of decreasing the augmenting path by j was bounded by p^j < 
{j + l)/m'^^ which was also used by Giel and Wegener in [6]. To prove the stronger 
condition we also need to bound the probability that the augmenting path is 
lengthened by j. Since there are at most 2h choices to lenghthen the path by 1, 
there are at most {2hy different ways to lengthen the path by j. Given that 2 
edges need to flip for each of the j lengthenings, the probability of performing a 
jump of at least length j is bounded by 

N i^hY (2hy 1 



l=J 



where the second inequality is obtained by considering that the term of the sum 
is maximised for i = j. Then, by considering relevant steps (i.e. prei > l/(em^)) 
as in the rest of the proof. Condition 2, with 6 = 1 and r = 22 follows from: 



^ < min|l,^l < min|l,^l < 22^''' 



for m > 2. 



4.2 Diversity with fitness dupUcates (Theorem 3, [|4j) 

The probabihty of increasing the potential of the population ip{Pt) by j needs to be 
derived (i.e. -P(A<^ = +j)). It turns out that most of the calculations for bounding 
the probability in the opposite direction apply. 

In order to increase the potential by j, it is necessary to select some i/k with 
< k < if and flip (f — k + j 1-bits into 0-bits. The probability is: 
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and the rest of the proof follows exactly the same calculations as for the drift 
condition in the opposite direction. 



4.3 (1,A)-EA Analysis (Theorem 7, [18] ) 

The probability of performing jumps of length j away from the optimum of Onemax 
is: 

WM,<A(;)^<|<2A(i)' 
The probability of self-loops is bounded by: 

Pk,k = l-(l-{l- l/n)A > 1 - (1 - l/(2e))" > 1 - 



Hence 



Pk,k+j ^ 2A ^ 1 



Pk,k l-c^V2, 

and the condition is established for any A = r{i) with 1 < r{i) = o{i/log{i)). 
Since in Theorem 7 [18j, A < (l — e) log^ n, the Simplified Drift Theorem implies 



e/2 



a runtime of at least 2(i-=)i°g" with probability 1 — 2~^^^°^^ . This is a log n factor 
weaker than the statement of Theorem 7 \l8l where jumps towards the target are 
not considered. 



V log n j 



4.4 Fitness-Proportional EA (Theorem 8, [12]) 

The fitness-proportional EA (PEA) analysed on OneMax in [12j works with a 
population of size /i, fitness-proportional selection, and mutation as only variation 
operator. It is proved for all polynomial population sizes that the algorithm is 
ineffective, using an appropriately defined potential function with drift away from 
the target and small probablity of jumping towards the target. However, we have 
noticed that very large jumps away from the target are possible even for the 
smallest non-trivial population size of = 2. For instance, the analysis does not 
exclude situations where the two individuals x\ and differ strongly in their 
fitness, e.g., = 0.999n and |x2| = 0.5n. The potential function gixx^X'i) = 
gkil -i-gl^^l used in the paper scales fitness values exponentially. Suppose selection 
chooses X2 for reproduction twice, which has probability Then the offspring 

population would consist of two individuals with a number of one-bits close to 
n/2. It is easy to see that this reduces the potential drastically, which corresponds 
to a step away from the target that is larger than allowed by the corrected drift 
theorem. 
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We can overcome this anomaly by replacing PEA with a modified algorithm 
PEA' with stronger selection pressure. Let Xi,...,x^ be the individuals of the 
current population and assume w. l.o.g. that /(xi) > ■■■ > /(a;^). Let /tot = 
/(xi) + ■ ■ ■ + f{x^). A generation of PEA' does the following, where "mutate" 
means the usual standard bit mutation. 

1. Mutate xi and add the result to the offspring population. 

2. For t = 2,. .. ,fi: 

(a) Choose a parent x, where 



(b) Mutate x and add the result to the offspring population. 

In other words, the best individual is selected and mutated at least once. This 
prevents the case outlined above where the potential of the offspring population 
drops drastically compared to the parent population. In the following steps, the 
other individuals are selected with higher probability than in the original algo- 
rithm. It is easy to verify that PEA' always uses a well-defined probability dis- 
tribution on the population when creating the offspring population and that its 
optimization time on OneMax is stochastically at most as large as the one of the 
original PEA. 

The analysis in [12] relies on the random Si, where Si is the number of times 
Xi is chosen for mutation. Since it is proved that /(xi)//tot < 2//i for 1 < i < /i, 
it follows immediately for the original PEA that E[Si) < 2, which is crucial for 
the rest of the analysis. We show that E{Si) < 2 also in the PEA'. First we get 



E{Si) = 1 (/i - l)/(xi)//tot - (/U - l)//i < /i/(xi)//tot < 2, where the first 



inequality uses /(xi)//tot > and the second one uses f{xi)/ftot < 2//i for 



2 > L For ^ > 2, we get E{Si) < - l)/(xi)//tot + < (/U - 1) ■ 2//i + < 2. 



Now the rest of the original analysis can be carried over. 

The authors will use the arguments outlined above in an extended journal 
submission of their paper [12]. This submission is currently under preparation. 
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• the probability of choosing xi is 

• the probability of choosing Xi is 




for 2 < i < /i. 



/tot 
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